2,125 research outputs found

    Approximately Minwise Independence with Twisted Tabulation

    Full text link
    A random hash function hh is ε\varepsilon-minwise if for any set SS, S=n|S|=n, and element xSx\in S, Pr[h(x)=minh(S)]=(1±ε)/n\Pr[h(x)=\min h(S)]=(1\pm\varepsilon)/n. Minwise hash functions with low bias ε\varepsilon have widespread applications within similarity estimation. Hashing from a universe [u][u], the twisted tabulation hashing of P\v{a}tra\c{s}cu and Thorup [SODA'13] makes c=O(1)c=O(1) lookups in tables of size u1/cu^{1/c}. Twisted tabulation was invented to get good concentration for hashing based sampling. Here we show that twisted tabulation yields O~(1/u1/c)\tilde O(1/u^{1/c})-minwise hashing. In the classic independence paradigm of Wegman and Carter [FOCS'79] O~(1/u1/c)\tilde O(1/u^{1/c})-minwise hashing requires Ω(logu)\Omega(\log u)-independence [Indyk SODA'99]. P\v{a}tra\c{s}cu and Thorup [STOC'11] had shown that simple tabulation, using same space and lookups yields O~(1/n1/c)\tilde O(1/n^{1/c})-minwise independence, which is good for large sets, but useless for small sets. Our analysis uses some of the same methods, but is much cleaner bypassing a complicated induction argument.Comment: To appear in Proceedings of SWAT 201

    The use of FEP Teflon in solar cell cover technology

    Get PDF
    FEP plastic film was used as a cover and as an adhesive to bond cover glasses to silicon solar cells. Various anti-reflective coatings were applied to cells and subsequently covered with FEP. Short circuit currents were measured before and after application of the coating and of the FEP. FEP was bonded to seven of the nine differently coated cells, with no change in the total short circuit current in four cases

    Flexible, low-cost silicon solar cell arrays

    Get PDF
    Silicon solar cell arrays are pressure-bonded to flexible backing and protected by fluorinated ethylene propylene cover in one mechanized operation. Arrays packaged by this method are flexible, lightweight, insulated, breakage resistant and less expensive

    Method of making silicon solar cell array

    Get PDF
    A heat sealable transparent plastic film, such as a flourinated ethylene propylene copolymer, is used both as a cover material and as an adhesive for mounting a solar cell array to a flexible substrate

    Accelerated growth in outgoing links in evolving networks: deterministic vs. stochastic picture

    Full text link
    In several real-world networks like the Internet, WWW etc., the number of links grow in time in a non-linear fashion. We consider growing networks in which the number of outgoing links is a non-linear function of time but new links between older nodes are forbidden. The attachments are made using a preferential attachment scheme. In the deterministic picture, the number of outgoing links m(t)m(t) at any time tt is taken as N(t)θN(t)^\theta where N(t)N(t) is the number of nodes present at that time. The continuum theory predicts a power law decay of the degree distribution: P(k)k121θP(k) \propto k^{-1-\frac{2} {1-\theta}}, while the degree of the node introduced at time tit_i is given by k(ti,t)=tiθ[tti]1+θ2k(t_i,t) = t_i^{\theta}[ \frac {t}{t_i}]^{\frac {1+\theta}{2}} when the network is evolved till time tt. Numerical results show a growth in the degree distribution for small kk values at any non-zero θ\theta. In the stochastic picture, m(t)m(t) is a random variable. As long as isindependentoftime,thenetworkshowsabehavioursimilartotheBarabaˊsiAlbert(BA)model.Differentresultsareobtainedwhen is independent of time, the network shows a behaviour similar to the Barab\'asi-Albert (BA) model. Different results are obtained when is time-dependent, e.g., when m(t)m(t) follows a distribution P(m)mλP(m) \propto m^{-\lambda}. The behaviour of P(k)P(k) changes significantly as λ\lambda is varied: for λ>3\lambda > 3, the network has a scale-free distribution belonging to the BA class as predicted by the mean field theory, for smaller values of λ\lambda it shows different behaviour. Characteristic features of the clustering coefficients in both models have also been discussed.Comment: Revised text, references added, to be published in PR

    FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

    Full text link
    We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for \textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms. By leveraging a LSH style randomized indexing procedure and combining it with several principled techniques, such as reservoir sampling, recent advances in one-pass minwise hashing, and count based estimations, we reduce the computational and parallelization costs of similarity search, while retaining sound theoretical guarantees. We evaluate FLASH on several real, high-dimensional datasets from different domains, including text, malicious URL, click-through prediction, social networks, etc. Our experiments shed new light on the difficulties associated with datasets having several million dimensions. Current state-of-the-art implementations either fail on the presented scale or are orders of magnitude slower than FLASH. FLASH is capable of computing an approximate k-NN graph, from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than 10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam dataset, using brute-force (n2Dn^2D), will require at least 20 teraflops. We provide CPU and GPU implementations of FLASH for replicability of our results

    Quality Assessment of Linked Datasets using Probabilistic Approximation

    Full text link
    With the increasing application of Linked Open Data, assessing the quality of datasets by computing quality metrics becomes an issue of crucial importance. For large and evolving datasets, an exact, deterministic computation of the quality metrics is too time consuming or expensive. We employ probabilistic techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient estimation for implementing a broad set of data quality metrics in an approximate but sufficiently accurate way. Our implementation is integrated in the comprehensive data quality assessment framework Luzzu. We evaluated its performance and accuracy on Linked Open Datasets of broad relevance.Comment: 15 pages, 2 figures, To appear in ESWC 2015 proceeding

    Ionized dopant concentrations at the heavily doped surface of a silicon solar cell

    Get PDF
    Data are combined with concentrations obtained by a bulk measurement method using successive layer removal with measurements of Hall effect and resistivity. From the MOS (metal-oxide-semiconductor) measurements it is found that the ionized dopant concentration N has the value (1.4 + or - 0.1) x 10 to the 20th power/cu cm at distances between 100 and 220 nm from the n(+) surface. The bulk measurement technique yields average values of N over layers whose thickness is 2000 nm. Results show that, at the higher concentrations encountered at the n(+) surface, the MOS C-V technique, when combined with a bulk measurement method, can be used to evaluate the effects of materials preparation methodologies on the surface and near surface concentrations of silicon cells

    Evaluating the social acceptability of voice based smartwatch search

    Get PDF
    There has been a recent increase in the number of wearable (e.g. smartwatch, interactive glasses, etc.) devices available. Coupled with this there has been a surge in the number of searches that occur on mobile devices. Given these trends it is inevitable that search will become a part of wearable interaction. Given the form factor and display capabilities of wearables this will probably require a different type of search interaction to what is currently used in mobile search. This paper presents the results of a user study focusing on users’ perceptions of the use of smartwatches for search. We pay particular attention to social acceptability of different search scenarios, focussing on in-put method, device form and information need. Our findings indicate that audience and location heavily influence whether people will perform a voice based search. The results will help search system developers to support search on smartwatches

    Distribution of sizes of erased loops for loop-erased random walks

    Get PDF
    We study the distribution of sizes of erased loops for loop-erased random walks on regular and fractal lattices. We show that for arbitrary graphs the probability P(l)P(l) of generating a loop of perimeter ll is expressible in terms of the probability Pst(l)P_{st}(l) of forming a loop of perimeter ll when a bond is added to a random spanning tree on the same graph by the simple relation P(l)=Pst(l)/lP(l)=P_{st}(l)/l. On dd-dimensional hypercubical lattices, P(l)P(l) varies as lσl^{-\sigma} for large ll, where σ=1+2/z\sigma=1+2/z for 1<d<41<d<4, where z is the fractal dimension of the loop-erased walks on the graph. On recursively constructed fractals with d~<2\tilde{d} < 2 this relation is modified to σ=1+2dˉ/(d~z)\sigma=1+2\bar{d}/{(\tilde{d}z)}, where dˉ\bar{d} is the hausdorff and d~\tilde{d} is the spectral dimension of the fractal.Comment: 4 pages, RevTex, 3 figure
    corecore